:Section  11.3 

FILE  copy. 

Do  not  remove 

NATIONAL  BUREAU  OF  STANDARDS  REPORT 

NBS  PROJECT  NBS  REPORT 

1103-40-11625  1 July  1959  6513 


ON  THE  ’’SYNTHETIC  RECORD”  PROBLEM 
(Estimation  of  the  variance) 

by 

Joan  Raup  Rosenblatt 
Statistical  Engineering  Laboratory 


Technical  Report  No.  1 
to 

Water  Resources  Division 
U.  S.  Geological  Survey 
Department  of  Interior 


IMPORTANT  NOTICE 


NATIONAL  BUREAU  OF  STAT 
intended  for  use  yvithin  the  Gc 
to  additional  evaluation  and  re\ 
listing  of  this  Report,  either  in 
the  Office  of  the  Director,  Nati 
however,  by  the  Government  aj 
to  reproduce  additional  copies 


Approved  for  public  release  by  the 
director  of  the  National  Institute  of 
Standards  and  Technology  (NIST) 
on  October  9,  2015 


egress  accounting  documents 
ally  published  it  is  subjected 
production,  or  open-literature 
n is  obtained  in  writing  from 
iich  permission  is  not  needed, 
epared  if  that  agency  wishes 


U.  S.  DEPARTMENT  OF  COMMERCE 


NATIONAL  BUREAU  OF  STANDARDS 


i 


Pa?o^««-fe--^50fUSG^) 
Technical  Note  No„  1 
1 July  1959 


On  the  "Synthetic  Record"  Problem 
(Estimation  of  the  Variance) 

Joan  Raup  Rosenblatt 
National  Bureau  of  Standards 

1 . Introduction 

The  "synthetic  record"  problem  (my  terminology) 
arises  in  the  following  way.  Discharge  records  are  obtained 
for  two  streams  in  the  same  geographical  area^  one  record 
being  longer  than  the  other.  It  is  desired  to  find 
conditions  under  which  the  data  in  the  longer  record  can 
be  used  to  improve  the  estimates  of  mean  and  variance  of  the 
discharge  in  the  stream  for  which  there  is  a short  record. 
Knowledge  of  these  conditions  would  contribute  to  (i)  evalu= 
ation  of  the  "quality"  of  estimated  parameters  of  discharge 
distributions  and  (ii)  determination  of  criteria  for  the 
establishment  and  continuation  of  streami-gaging  stations. 

The  term  "synthetic  record"  is  used  because  it 
literally  describes  one  feature  of  standard  hydrologic 
practice.  It  is  customary  to  use  the  data  from  a long  record 
of  discharges  to  obtain  estimates  of  the  discharges  in  another 
stream  for  the  corresponding  dates ^ and  to  publish  the 
resulting  record. 
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2 . Statistical  Model  and  Assumptions 

It  is  assumed  that  the  simultaneous  discharges  (X^Y) 

from  two  streams  have  a joint  normal  distribution  with  para- 

meters  (x  , |i  , p ® Pcj  /a  . It  is  further  assumed 

X y X y X y 

that  pairs  of  values  (X,Y)  obtained  at  different  times 
are  independent.  Let  X denote  the  discharge  for  the  stream 
with  the  long  record.  We  are  concerned  with  estimation  of 
and  . The  data  given  are  pairs  of  observations 

y y 

(Xj^,Y^),  .... 

for  the  period  covered  by  the  short  record,  and  ng  additional 
values 


from  the  long  record. 

The  "synthetic  values”  of  Y are  estimated  from  a 
regression  equation  fitted  to  the  n^  paired  observations. 


Y,  + b(X 


ni+j 


" Xi),  j » 1, 


o o p 


- 3 


where 


Qj  + • • • + X^  f 


ni  Yi  » + . . . + , 


“l 

b - ^f^(Xi-X,)(Y^-Yp/^S^(X,-X^)^ 


The  mean  |Xy  is  estimated  by  the  mean  of  observed  and 
synthetic  values  of  Y combined 


U = Y,  + 


1 ^ 


b(X  - Xj^)  , 


where 


n„X„  “ X - + . 0 o + X 

2 2 n^+1 


(*) 

The  variance  of  U has  been  obtained  by  Thomas^ 
who  has  discussed  the  properties  of  the  estimator  U with 

reference  to  values  of  n^,  n^ii  and  p.  The  purpose  of  this 

note  is  to  investigate  some  possible  estimators  for  a|<. 


(*)  H.  A.  Thomas^,  Jr.,  "Correlation  Techniques  for  Augmenting 
Stream  Runoff  Information”,  manuscript. 
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(3.1) 


(3.2), 


(3.3) 


3.  Some  "Natural"  Estimators  for  <j^ 


First,  three  functions  of  the  observations  are 


defined. 


S? 


1 “ ^ ^ 
A 1 1 


2 /N  A 2 

’i  ■ <Vj 


t>*  S (X  , -X,)*  . 
j-1  “r^  ^ 


1 2 2 A 2 

S|  - S (Y.-U)  + Z (Y„  -U)  , 

3 1-1  1 j-1  “l+J 


where 


“2  ^2 


A 

Y 


Ui+1  n^+Ug 


Three  estimators  which  seem  to  be  likely  candidates 


are  as  follows 


5 - 


(3.4) 


Tj  - S|/(n^-l) 


(3.5) 


Ta  - (S|  + S|)/(n^  + °2  - 2) 


(3.6) 


’'3  “ + °2  ■ 


The  first  of  these,  , is  the  usual  unbiased 

estimator  based  on  the  observed  values  of  Y.  T_  is  the 
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estimator  which  would  be  calculated  if  the  fact  were  ignored 
that  some  of  the  Y values  were  calculated.  provides 

an  alternative  way  of  combining  observed  and  calculated 
Y values , 

Observe  that  there  is  a relation  among  these 
estimators,  since 


to  give  low  values  for  the  variance  of  Y.  It  will  be 
seen,  however,  that  they  can  be  preferable  to  for 
sufficiently  large  p. 


(3.7) 


2 


Each  of  the  estimators  is  biased,  tending 


- 6 - 


(3.8) 


E T, 


* • 
y ' 


9) 


E T, 


- 


(ng-l) 

(n^+ii2-2)  (n^-3) 


.p2)Q2 


E T. 


n«(n  -4) 

(j2  _ f — — 

y (nj^+ng”!)  (n^~3) 


In  order  to  make  comparisons  among  these  estimators ^ 
the  variance  of  and  the  mean-squared-errors  of  Tg  and 

T„  were  calculated.  These  are  given  in  formulas  (3.11)  - 

w 

(3.13). 


(3.11) 

(3.12) 


Var  (T^) 
MSE  (Tg) 


2 dy  /(n^^-l)  , 


(n  -1)  . 

var  (T  ) + — 

^ (N-2)^  y 


2 A 


+ (ng  + 1)B  + (N-2)C 


(n^  + 1)  (2n^^  + n^ 
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(3.13) 


(3.14) 


(3.15) 


(3.16) 


^ I 2A 


MSE  (To)  “ Var(Tj  + - 

3 ^ (N-1)^  y 


[■ 


+ (ng  + 2)B  + (N^l)C 


- (n^  + l)(2n^  + Dg  -2)/(n 


X->J 


where  N “ 


n,+l 


A - (n-  -Dp  + (n,  +4)p^  (l-pf>»;«- 

f f • ’ 


^ ci-p-’*)® ; 


B = + ^P=*(l-P*)  + 


(n^-3)  (nj^-5) 


a-p*)  , 


n,-4 

2 -i-5  (1-p*)  . 


iij^-3 


For  further  abbreviation,  Qj^  will  denote  the 
quantity  in  square  brackets  in  the  expression  for  MSE(Tj^), 
k - 2,3. 

The  ”inf ormation”  ratios  are  the  reciprocals  of  the 
f ol lowing « 


MSE(T2) 


(n^-DCng-l) 

2(N-2)^ 


(3.17) 


Var (T^) 


1 + 
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(3.18) 


MSE (Tg) 
Var(T^) 


(n  -l)n 

1 + -■  ^ Q3 


The  notation 


Var (T, ) 

MSE  (Tj^)  " 


k “ 2,3  , 


will  be  used. 


4.  Properties  of  the  Information  Ratios 

The  following  general  properties  hold  for  the  two 
information  ratios.  (Where  the  subscript  is  dropped,  the 
same  statement  holds  for  both  cases.) 


(4.1) 

“i’  “2^  “ 

1 + (n^  -"l)/(Hj^  - 

(4.2) 

I3  (1, 

Hi,  ng)  = 

1 + n2/(n^~ 

1)  . 

(4.3) 

r («» 

1 for  all  n^. 

“2  ° 

(4.4) 

I (0, 

»!,  ng) 

1/n^  as  n.. 
X 

— > 00 

with  the  ratio  fixed,  or  with  the  difference 

(ng-n^)  fixed. 


- 9 - 


(4,5)  > 1 as  oo  , 

with  where  is  the  value  of  p for 

which  I (p,  Ug)  = 1 . 

5.  Conclusions 

From  (4.4)  and  (4.5)  it  is  clear  that 

L 

would  be  the  preferred  estimator  for  very  large  n^^  and 
even  for  moderately  large  n^  if  the  value  of  p is  not 
believed  to  be  very  close  to  unity. 

For  the  numerical  values  of  n^^^  which  would 
apply  in  the  case  of  discharge  records,  the  properties  of 
Tg  and  are  essentially  indistinguishable.  would 

probably  be  preferred  on  grounds  of  convenience,  if  either 
were  to  be  used. 

The  table  below  shows  how  large  p would  have  to 
be  in  order  that  Tg 
sense  I (p,  n^,  n^) 


or  T_  be  as  good  as  T,  in 
d 1 


1. 
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Values  of  p 

for  which  I (p,  n^,  n^) 

= 1 

N = “l  “2 

“1 

P 

®^o 

30 

15 

.8 

20 

.7 

40 

15 

.8 

20 

.8 

25 

.8 

30 

• 8 

360 

180 

.9 
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